Canadian Birth Probability AnalysisΒΆ

By Nicole Bidwell

IntroductionΒΆ


This analysis explores the chance of being born in Canada in a particular year. A description of how to reproduce the analysis by running the data pipeline can be found in the README.md file. The outline below serves as summary, explanation, and interpretation of the analysis.

DataΒΆ


The data used for this analysis is the provided birth rate and population data obtained through the World Bank's API.

The Chosen YearsΒΆ

This analysis explores the years 2010 to 2023. An example calculation for the probability of being born in Canada for the year 2012 is included, along with the changes in probability over time across all countries.

Retrieving and Loading the DataΒΆ

The script retrieve_load.py found in the src folder is used to obtain the required data from the World Bank API in a JSON format. This includes pulling data from 2010 to 2023 across multiple pages for both birth rate and population, along with using sqlite3 to create a table, load the data into the database, and querying for the required subset of the data.

When querying for the required subset of data it is important to filter for valid countries since the original data includes countries grouped in specific regions. Including these regions in the processed data would have resulted over counting in the later calculations of total births.

The Processed DataΒΆ

After querying for the required data a pandas dataframe is created which obtains the country information (ISO3 code, id, and name), year, birth rate, and population. The data is saved as a csv file, country_br_pop.csv, in the data folder for later usage.

Calculated ValuesΒΆ


The script calculate_probabilities.py in the src folder is used to perform the probability calculations.

Number of BirthsΒΆ

After loading the country_br_pop.csv data, a column birth is added to the data frame. This provides the number of births in each year for each country, using the formula:

$$\text{Number of Births} = \frac{\text{Birth Rate}}{1000}\times\text{Population}$$

These values are used in the following calculations.

Probability of Being Born in Canada for 2012ΒΆ

To calculate the probability of being born in Canada for a specified year I created the function calc_probability_country. This function calculates the percentage probability of being born in any specified country for any specified year within the dataset. The two formulas used are:

$$\text{Total Worldwide Births in the Year} = \text{sum of all countries' births in the year}$$

$$\text{Percentage Probability for a Country} = \frac{\text{Country's Number of Births in the Year}}{\text{Total Worldwide Births in the Year}}\times 100$$

For calculating the probability of being born in Canada for 2012, the function is called with Canada for the country parameter and 2012 for the year parameter. For more a more tangible interpretation, I included the equivalent ratio using the formula:

$$\text{Ratio Value} = \frac{1}{\text{Percentage Probability}}\times100$$

These values are saved in the output folder.

Probability of Being Born in any Specified Country for any Specified YearΒΆ

The calc_probability_country functions was also used to calculate the probabilites of being born in all other countries in the dataset for each year. These values are saved in the csv file, countries_prob.csv, in the data folder.

Global Average Number of Births per YearΒΆ

The last value I calculated was the global average number of births per year. Obtained using the following formula, this value later provides insight when interpreting the differences in birth probabilities from year to year.

$$\text{Global Average Number of Births per Year} = \frac{\sum{\text{(Total Births per Year)}}}{\text{Total Number of Years}}$$

Results and InterpretationΒΆ


Probability of Being Born in Canada in 2012ΒΆ

The probability of being born in Canada in 2012 is $0.261\%$, which indicates that, on average, 1 out of 383 people born in 2012 were born in Canada.

Data Visualization and InterpretationΒΆ

The script graphs.py in the src folder is used to generate plots using Plotly Graph Objects and Plotly Express, which are later saved in the output folder. Each plot has a corresponding function in the script that was used to generate the plot. These plots allow for easier interpretation and deeper analysis of the birth probabilities.

1. Canada Bar Chart for 2012ΒΆ

Script function name: canada_bar_chart.

This plot displays the probability of being born in Canada in 2012. It provides a straightforward visual comparison between the probability of being born in Canada and elsewhere in 2012. When hovering over the bars we can confirm the exact values.

Out[14]:

Here we see the probability of being born in Canada in 2012 appears to be small. Further analysis provides more meaningful insight.

2. Canada Trend Line Over TimeΒΆ

Script function name: canada_timeline.

This plot displays the change in probability of being born in Canada from 2010 to 2023.

Out[17]:

Here we see the 2012 probability of $0.261\%$ is a minimum value over the period from 2010 to 2023. Notably, the maximum probability is $0.278\%$ which occured in 2021. This provides the range of $0.071\%$. This difference may seem small, but when we consider the global average number of births per year, calculated to be roughly $140,039,787.23$ people, a $0.071\%$ difference means roughly $99428.24$* more people were born in 2023 compared to 2012.

*calculated by $0.071/100\times140,039,787.23 = 99428.24$

3. Top 5, Bottom 5, and Canada Time LineΒΆ

Script function name: country_timeline.

Similar to the Canada Trend Line Over Time, this plot includes additional countries' probability trend lines between 2010 to 2013. The included countries on the plot are the 5 countries with the highest average probability (India, China, Nigeria, Pakistan and Indonesia) and the 5 countries with the lowest average probability (Nauru, British Virgin Islands, San Marino, Tuvalu, and Palau), along with Canada for comparison.

The values in the legend can be clicked to better display overlapping trend lines.

Out[33]:

While Canada is not one of the one of the countries with the lowest birth probabilities it remains closer to the bottom then the top. From this graph it is also evident that many of the countries appear to have relatively stable birth rate probabilities over the period, except for China and India. In China we see a downwards trend following 2017. In India we see a slight downwards trend between 2010 to 2014, followed by more stability in the onwards years.

ConclusionsΒΆ


This analysis discouvered the probability of being born in Canada in 2012 is $0.261\%$ (or 1 in 383). I also dived deeper to gain insight into the changes in birth probabilites over time period of 2010 to 2023. While interpreting the trend lines it is evident that the biggest change in birth probabilites in Canada is between 2012 and 2021. This changed equated to roughly $99428.24$ more people being born in 2021 compared to 2012. That said, Canada's birth probabilities remained relatively stable compared to countries like China and India, which had the highest birth probabilities but also displayed more fluctuation.

The most notable flucuation was the drop in China's birth probabilities onwards from 2019. Potential impacts of Covid-19 could have contributed to this drop. This analysis could be extended further by widening the time period. Observing a wider time period would allow for deeper insight and likely more fluctuation present in the trend lines.